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Abstract 

In this research endeavor, it was hypothesized that the sound produced by animals 
during their vocalizations can be used as identifiers of the animal breed or species even 
if they sound the same to unaided human ear. To test this hypothesis, three artificial 
neural networks (ANNs) were developed using bioacoustics properties as inputs for the 
respective automatic identification of 13 bird species, eight dog breeds, and 11 frog 
species. Recorded vocalizations of these animals were collected and processed using 
several known signal processing techniques to convert the respective sounds into com¬ 
putable bioacoustics values. The converted values of the vocalizations, together with 
the breed or species identifications, were used to train the ANNs following a ten-fold 
cross validation technique. Tests show that the respective ANNs can correctly identify 
71.43% of the birds, 94.44% of the dogs, and 90.91% of the frogs. This result show 
that bioacoustics and ANN can be used to automatically determine animal breeds and 
species, which together could be a promising automated tool for animal identification, 
biodiversity determination, animal conservation, and other animal welfare efforts. 


1. Introduction 


Identification of animal breeds or specie^] is an important method in animal conservation, 
biodiversity determination, animal welfare efforts, animal breeding, and other human pro¬ 
grams that are geared towards production, improvement, protection and conservation. The 
usual method to identifying animal species is by v isual inspection of the animal’s anatom¬ 


ical features vis-a-vis a published set of standards flCoilel . 120051: iFord and Cannatellal . 11993 


1 For brevity, the term “species” is used hencefort throughout the text to mean either “breeds” or 
“species” depending on the context without loss of specificity. 
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Stettenhelml . 2000). Examples of anatomical features inspected are body dimensions, types 
and colors of coats, and skin covering. This method is usually done by experts in the held, 
particularly the breeders and the systematists. Identifying the species of birds that migrate 
to a certain place, for example, requires a tedious bird-watching procedure that is often con¬ 
ducted over several days. To identify the visiting bird species, the observers need to actually 
see and capture with a high-resolution camera the images of the animals. Obtaining an 
unobstracted line of sight between the observer and the observed requires proper positioning 
of the observer at some safe-enough distance, with some observers going to the extent of 
wearing camouflage so as not to distract the observed. This procedure becomes much more 
complicated to conduct if the animals to be observed are nocturnals, are perched atop high 
canopy trees, or are underwater. 

The problems with the current method result from its inherrent requirement that the observer 
must have an unobstracted line of sight with the observed, and with ample enough lighting. 
This is because light travels alon g a str aight lin e and can only be detected by human eyes 
at a certain ran ge of intensities (Marriott et all Il959h . Because humans are predominantly 


visual in nature (IThorpe et all Il996l) . most of its activi ties rely much on the se nse of seeing, 


using only the other senses for verifying what was seen (Ernst and Banks, 2002). Identifying 
an animal, for example, starts from obtaining an image of its anatomical features, either 
by the naked eye or as captured by a camera system. The physical process of obtaining an 
image is by facing the sensor (e.g., eye, camera, or other image capturing devices) towards 
the direction of the light rays that incidentally reflected on the surface of the animal. The 
identification is optionally verified when the observed animal produces an identifiable vocal¬ 
ization and is heard by the observer. However, when the animal was not seen (i.e., reflected 
light did not reach the sensor) or was partially occluded (i.e., reflected light was blocked by 
another object), no image can be obtained and thus, no identification can be performed even 
if the vocalization was heard by the observer. 

In recent years, researchers have direct ed their efforts towards ut il izing animal vocaliza- 


tions to detect and identify animals (Agranat 


Aeranat. 

2009; 

Aide et ah, 

2013; 

Clemins. 

2005 


Clemins and Johnson! . 120021: Shapiro, 2009). This gives rise to the importance of bioacous¬ 
tics, the study of animal vocalization. Detecting and identifying animals through their 
vocalizations do not have the problematic inherrent requirements that identifying through 
visual means have. Sound waves are propagated by their sources to all directions in space 
and do not require an unobstructed line of sight as the waves can bounce back from object 
to object. Additionally, the observer do not have to find where the animals being observed 
are located to orient its sensor because the ear, or any other listening devices, can sense the 
sound waves from any direction and position. Because of this, most of biodiversity recording 
efforts started from what was heard first in the field, rather than what was first seen. 

Animals vocalize to communicate in formation to other animals, whether within the same 
species or with another species (Witzany, 2014). The reasons for communicating, among 
many others, include (1) to impress and attract the opposite sex for reproduction purposes, 
(2) to declare territorial boundaries, (3) to identify family members, (4) to warn others of 
the presence of a predator, and (5) to inform others on the location of food source. For 
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example, birds sing and frogs croak to attract potential mates, while dogs bark to warn their 
human master of the presence of strangers. 


Without the aid of hearing devices, the vocalizations of most animals within the same 
species appear the same to humans. This is because researchers found that human ears 
are sensiti ve to low-frequency sound while most animals are sensitive to high-frequency 


ones (Masterton et ah, 1969). Other researchers have found out that through evolution, the 


vival pressure from their predators ( 

Bar 

3er and Conner. 

2007; 

Barber and Kawahara, 

2013; 

Conner and Corcoran!. 2012: 

Igic et al.. 

2015). However, because humans are the ultimate 


preda tor, human hearing has evolved to best detect sounds created by other human beings 


only (Nelken et ah, 2005). As a result, the respective sounds produced by two animals from 


within the same genus but of different species do not appear distinctive from each other with 
regards to human ear. This is the reason why for the longest time, vocalization was not 
considered by researchers as a primary identifier of species. However, because of the recent 
advances in sound technology coupled with the development of complex signal processing 


within the same animal species is possible (. 

anik et ah, 2006 

2009; 

Slabbekoorn and Smith. 

2002; 

Witzanv 

, 20l3T" 


Moore et all 120061: IShapirol. 


The use of vocalization as identifier of species is based on the framework that acoustic 
properties of sound are used by animals to encode their identities. The spectral features 
of sound waves, including the fundemantal on es su ch as frequency and harmonics, differ 
between two different sources. Because of this, Cortopassi and Bradbury (2000) were able 
to differentiate pairs of orange-fronted parakeets (Aratinga canicularis ) using the spectral 
features of bird calls. Similarly, the mother-pup pairs of South American fur seals ( Arc- 
tocephalus australis ), Galapagos fur seals (Arctocephalus galapagoensis), and Galapagos sea 
lions ( Zaloyhus californianus woll e baeki) we r e also differentiated via their calls’ spectral fea¬ 
tures ( Phillips and Stirling . 2000 : Trillmichl . 198llh It is based on this framework that the 
following null (Hq) and alternate ("Hi) hypotheses are stated for this effort: 

"Hq. There does not exist any combination of the spectral properties of the vocalizations of 
animals that can differentiate breeds or species with at least 70% accuracy. 


'Hi- There does exist a combination of the spectral properties of the vocalizations of animals 
that can differentiate breeds or species with at least 70% accuracy. 

The minimum accuracy level on the above stated hypotheses was chosen as 20% more than 
a randomized toss of coin, i.e., an unbiased coin toss is 50% while the additional 20% is 
attributed to the confidence that the system will provide. The choice of 70% is arbitrary 
because the standard for an acceptable accuracy level has not been set by the discipline nor 
by the industry. 

In the past, the process of differentiating objects based on a complex combination of 
their visual features was automated by a machine vision system (MVS) that uses an 
artificial neural network (ANN) for efficient classification. ANNs are abstract models of 
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the human brain that is capa ble of supervised learning and then forming a generaliza 


tion from a set of experiences (IMcCulloch and Pittsl . Il943l : iRosenblattl . 1195811 . ANNs, for 


example, have been used by local researchers to automate the error-prone object clas¬ 
sification capabilit ies of humans such as g rading the ripeness of tomatoes (Lycopersicon 

and differentiating cracks, bloodstains and 


5 gr ai 

esculentum ) fide Grano and Pabicd . l200T aJbT). 
dirt in eggs ( Zarsuela and PabicoL 2007 1 


All of these automated systems used simple 
cameras as artificial eyes to extend the seeing capabilities of the human eyes beyond their 
normal working time and way above their normal working rate. As a result, the object 
classification process was made faster, almost error-free, and used less r esour ces. This 


impr oves t he classification efficiency significantly over the manual ones flPabico et all . 2008 


2009, 2012). 


In this effort, the spectral properties of vocalization of three animals were automatically iden¬ 
tified using ANNs. The properties were extracted following the various techniques developed 
in the bioacoustic and the signal processing disciplines. Some of these properties are Spec¬ 
tral Centroid (SC), Spectral Flux (SF), Spectral Roll-off Frequency (SRF), Zero Crossing 
Rate (ZCR), Mel-Frequency Cepstral Coefficients (MFCC), and Linear Predictive Coding 


(LPC) ( McEnnis et al. . 2005:). Combinations of these properties were used to train three 


independent ANNs to identify 13 bird species, eight dog breeds, and 11 frog species, respec¬ 
tively. These animals were chosen not only because they are the most common animals in the 
Philippines, but the Philippine mountains and forests play hosts to most known migratory 
species of birds, and are home to vast and various groups of amphibians and reptiles. In 
fact, the country ranks fourth in the world in terms of bird endemism and first in terms of 
amphibians and reptiles (Ong et. al.. 2002 ). Dogs, on the other hand, were included because 
Filipinos in general are dog lovers flGo, sling et al.U20101 1. To avoid the Type II I errors in clas 


sikcation each ANN was trained using a 10-fold cross validation technique flKohavil . 1995 


Mosteller . 19481) . 


An ANN that uses a combination of all known 28 spectral properties as its input yielded the 
highest accuracy rate of 71.43% for identifying bird species. On the other hand, the ANN 
that was trained to identify frog species also uses the 28 spectral properties and yielded a 
90.91% accuracy rate. The ANN for identifying dog breeds only required a combination 
of 4 spectral properties to yield a high accuracy rate of 94.44%. These results show that 
there does exist a combination of the spectral properties of the vocalizations of animals that 
can differentiate breeds or species with high accuracy rate. One implication of this result 
is that a smartphone technology-enabled “crowdsourcing” solution to automatically and 
transparently collect information on fauna biodiversity by the common people may become 
a possibility in the near future. This could significantly enhance data collection, not only for 
biodiversity inventory efforts, but for other animal conservation and welfare endeavors. 





















































5 


Table 1: The bird species used in vocalization identification showing the common name, 
scientific name, and place of origin. 


Common Name 

Scientific Name 

Place of Origin 

1. Bananquit 

Coereba flaveola 

South America 

2. Black Crake 

Amaurornis avirostra 

Africa 

3. Black Hornbill 

Anthracoceros malayanus 

Southeast Asia 

4. Eurasian Skylark 

Alauda arvensis 

Europe, Asia, Africa 

5. European Goldfinch 

Carduelis carduelis 

Europe 

6 . Philippine Bulbul 

Hypsipetes philippinus 

Mindanao 

7. Philippine Bush Warbler 

Cettia seebohmi 

Luzon 

8 . Philippine Drongo Cuckoo 

Surniculus velutinus 

Mindanao 

9. Water Pipit 

Anthus spinoletta 

Europe, Asia 

10. Rufous-tailed Hummingbird 

Amazilia tzacatl 

America 

11. Rusty Breasted Cuckoo 

Cacomantis sepulcralis 

Southeast Asia 

12. Spotted Kingsher 

Actenoides lindsayi 

Luzon 

13. White Rumped Shama 

Copsychus malabaricus 

Southeast Asia 


2. Materials and Methods 


2.1. Collecting audio clips of animal vocalizations 

2.1.1. Vocalization from birds 


Vocalization data from 13 bird species was used in this study. The basic information on 
these bird species are shown in Table [0 Each bird species had 25 different audio clipa, all 


came from the database of Michigan State University’s Avian Vocalizations Center flAVoCet 


2008). The total number of audio clips used was 325. 


2.1.2. Vocalization from frogs 


Sound data from 11 frog species were collected from different Internet databases (AmphibiaWeb, 
20001 ). The species are Boophis luteus , Bufo marinus, Fejervarya limnocharis , Hylarana 
glandulosa, Kaloula baleata, Kaloula pulchra, Microhyla butleri , Odorrana hossi , Phrynoidis 
aspera, Polypedates leucomystax and Rana catesbeiana. There were a total of 110 audio 
samples equally distributed to the 11 species. 


2.1.3. Vocalization from dogs 

Bark recordings of beagle, chihuahua, chowchow, shih tzu, and poodle were manu¬ 
ally obtained from reputable pet shops. Bark recordings of labrador retriever and 
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syberia n husky came from various contributed a udio repositories from previous experi¬ 
ments d.Iones and Gosling . 120051: iRiede and Fitchl . llOOOfb Ten samples of dog barks audio 
segments were used for each breed for a total of 90 samples. 


2.1.4. Recordings of negative examples 


Negative examples are sound clips that were neither produced by birds, dogs, nor frogs. 
These clips are important part of the dataset so that the ANN will be able to differen- 


1991 


tia te, not only b etwee n species, but sound from other sources as well flAha et ah 
Dietterich and Michalski, 1983 ). In this research effort, pseudo “species” or “breed” was 
added to each dataset. That is, a “pseudo species” was added for the bird dataset, a “pseudo 
breed” for the dog dataset, and a “pseudo species” for the frog dataset. Thus, the bird, dog 
and frog datasets have 14 species, 9 breeds, and 12 species, respectively. 


2.1.5. Training, test, and evaluation datasets 

The dataset (£) collected was divided into three sets namely, training (Strain), test (S tes t), 
and evaluation (£eval) sets. If there are a total of N samples in £, then N is as defined in 
Equation [U where At ra in is the total number of samples in the training set, N test is the total 
number of samples in the test set, and N eva \ is the total number of samples in the evaluation 
set. Note that £ is as defined in Equation [2] and that the sets are pairwise disjoint such that 
the expressions in Equations [3l [4] and [5] hold. The fraction of the datasets are A train = 0.71V, 
Atest = 0.17V, and lV eva i = 0.21V, with all the species equally distributed for each fraction. 


N = 

At ra i n T IV-^gst T N e vai 

( 1 ) 

£ = 

£train £test £eval 

( 2 ) 

{} = 

Strain Ete.st 

( 3 ) 

{} = 

Strain ^eval 

( 4 ) 

{} = 

£test £eval 

( 5 ) 



(6) 


2.2. Extraction of spectral properties of sound waves 


The spectral properties of the respec tive audio clips were automatically extracted using a 


software system called jAudio flMcEnnis et all 120051. These spectral properties, including 


their respective ph ysical princip l es, mathematical derivations , and proofs, were discussed 
detail elsewhere ( Bogert et al. . 1963 : Proakis and Manolakisl . 2007 1 and are not presented 
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here. However, for the benefit of the lay readers as well as for completeness, a brief description 
of each property is provided as follows: 

1. Mel-Frequency Cepstal Coefficients (MFCC) - Represents the short-term power spec¬ 
trum of a sound and is primarily used in speech recognition. 

2 . Zero Crossing (ZC) - The number of times that the time domain signal crosses zero 
within a given window. 

3. Root Mean Square (RMS) - Calculated per window in order to get the amplitude of 
the sound signal. 

4. Fraction of Low Energy Window Frames (FLWEF) - Indicates the variability of the 
amplitude of windows. 

5. Spectral Flux (SF): Signies the degree of change of the spectrum between windows and 
is the spectral correlation between adjacent windows. 

6 . Spectral Rolloff (SR): Indicates the skew of the frequencies present in a window. Eighty 
five percent of the energy in the spectrum is below this frequency. 

7. Compactness (C): Indicates the noisiness of the signal by getting the summation of 
frequency bins of Fast Fourier Transform (FFT). 

8 . Method of Moments (MoM): Composed of the rst ve statistical models that make up 
the shape of the spectrograph of a given window. The components are area (zeroth 
order), mean (rst order), Power Spectrum Density (second order), Spectral Skew (third 
order), and Spectral Kurtosis (fourth order). 

9. Linear Predictive Coding (LPC): Calculates the linear predictive coefcients of a signal 
in which a particular value is estimated by a linear function of the previous values. 

10. Spectral Centroid (SC): Indicates where the “center of mass” of the spectrum is and 
measures the brightness of the sound. This is used as an automatic measure of timbre. 

11 . Beat Sum (BS): Indicates the sum of the beats in a sound and pertains to the alter¬ 
nating constructive and destructive interference caused by sound waves of different 
frequency. 

12. Strongest Beat (SB): The value of the beat with the strongest frequency. 

13. Strength of Strongest Beat (SSB): The intensity of the strongest beat. 

14. Spectral Variability (SV): Measures how the ranges of the elements of the sound differ 
from each other. 

The overall average and standard deviation of each of these properties were used as quantified 
inputs to the ANN. This results to a total of 28 quantified properties as identifiers for 
species. 



2.3. Structuring, Training and Evaluating ANNs 

2.3.1. Structuring ANNs 


ANN is an abstract mathematical model of the biological (specifically human) brain com¬ 
posed of a directed network of simple functions and coefficients. Here, the functions are nodes 
of the network and the coefficients are weights on the edges of the network. A function fi 
is connected to another function f 2 via a weighted edge represented by the coefficient a^ 2 . 
Here, the output of f\ is multiplied to coefficient a^ 2 and their product becomes an input 
to f 2 ■ The direction of the network in this study goes from the quantified spectral properties 
as inputs to the final species identity as output. The identity is encoded as a computable 
value represented by an n-bit binary system for an n-species identification problem. 


The general structure of the ANN is that the nodes are generally structured into m + 2 
layers, where the layers are classified into three major classifications: one input layer L 0l 
m > 1 hidden layers Li,L 2 , ..., L m , and one output layer L m+1 . Nodes within a layer are 
not connected to each other. Here, Lq is composed of at most 28 nodes representing the 
28 quantified spectral properties. The output layer is composed of n nodes, where n — 1 is 
the number of species to identify. The extra node is for identifying the negative examples, 
i.e., the “pseudo species” or “pseudo breed.” Each of the m hidden layers is composed 
of the same number of nodes k. Nodes in L j are connected to nodes in L l+l , where the 
sub-graph induced by the nodes in L* and L i+ \ form a fully-connected weighted bipartite 
network with ap}^.p + i}, Vi as weights. The functions in the nodes are sigmoidal ones 
f i+ 1 : {Oi} x {ap}_s.p + ij} —* {0,1}, where f i+ 1 are the functions in L i+1 , and O* G {0,1} 
are the outputs of fi in Lj. 


The ANNs in this work are denoted as ANN(j, [, k,m],n ), where j is the number of nodes 
in L 0 , k is the number of nodes in L m ’s, and n is the number of nodes in L m+ \. The ANN 
for birds, dogs, and frogs are structured as ANNbirdC?, [14,1], 14), ANNd og (j, [9,1], 9), and 
ANN frog (j, [12,1], 12), respectively. The optimal values of j for each ANN was obtained 
using a stepwise forward substitution metho d following the minimum description length 
criterion for model selection ( Hansen and Yul .l200lfh 


2.3.2. Training ANNs 

Each of the ANNs were trained using a feed-forward, back-propagation training algorithm 
over the samples in £ train . The errors generated by the initial random assignment for the 
set of coefficients {a} are minimized by back-propagating the identification differences, i.e., 
ANN classifies an audio clip as having vocalized by species A but was actually vocalized by 
species B. The ANN error used for training was the mean square error (MSE). The process 
was iteratively conducted until one of the following stopping criteria was met: 

1. When the ANN MSE over the samples in £ test has already worsen over several itera¬ 
tions. 
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2. When the ANN MSE over the samples in E tra j n has not been improved over 100 itera¬ 
tions. 

3. When the ANN MSE over the samples in £ train has already reached below 0.01. 


The whole p rocess desc ribed abov e was repeated nine more times in a 10-fold cross-validation 
manner ( Kohavil . 1995; Mostelleil 1948 ). where at each fold, the samples in each of the sets 
were changed. For example, a* ^ a i+ 1 , where 6 E evalji , o i+1 e E eV ai,i+n Vi and E e vai,i is 
the evaluation set at the ith fold. 


2.3.3. Evaluating the ANNs 


The respective accuracies of the ANNs were evaluated over the samples in eval. Accuracy 
is simply computed as the percentage of correctly identified species. An additional sec¬ 
ondary evaluation me t ric called error rate was computed f ollowi ng a multi-class confusion 
matrix (iBavaud et all 120061 : iKohavil . Il995t iMa ruling et all 120081) . It is possible for each of 
the ANNs to have an accuracy of 100% and a non-zero error rate. 


3. Results and Discussion 

3.1. Collection of audio clips 

The total hie sizes of the audio clips collected are 1,100.0 MB, 88.3 MB, and 494.1 MB for 
birds, frogs, and dogs, respectively. All audio clips were encoded using the Waveform audio 
hie format. This format was used because the spectral properties of soundwave are preserved 
when the audio stream is stored as raw and uncompressed bitstream. 


3.2. Spectral properties of vocalizations 

Figure [T] shows the SV box-and-whisker plot of each bird species. The plot shows that the 
respective SV distributions of the Philippine Bulbul and the Philippine Bush Warbler are 
extremely skewed to the right, while that of the Water Pipit and the Rusty Breasted Cuckoo 
are extremely skewed to the left. The plot also indicates that SV alone can be a good identifier 
between Philippine Bulbul and Philippine Bush Warbler, between Philippine Bulbul and 
Water Pipit, and between Philippine Bulbul and the Pseudo species. The reason for this is 
that the minimum SV of Philippine Bulbul is greater than the respective maximum SVs of 
Philippine Bush Warbler, Water Pipit, and Pseudo species. SV alone, however, can not be 
used to differentiate among Philippine Bush Warbler, Water Pipit, and Pseudo species. In 
fact, any other pairwise bird species will not be identified by SV alone. Although SV can not 
differentiate all other bird species, it is just but one dimension in the 28-dimensional spectral 
properties, the right combination of which may differentiate all pairwise combination of bird 
species considered in this study. Because of space limitation, the remaining 27 spectral 
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properties for birds, as well as the spectral properties of both dogs and frogs will not be 
described in this paper. 


Bananaquit--I- 
Black Crake - 
Black Hornbill -- 
Eurasian Skylark -- 
European Goldfinch 
Philippine Bulbul 
Philippine Bush Warbler 
Philippine Drongo Cuckoo 
Water Pipit-- 
Rufous-tailed Hummingbird -- 
Rusty Breasted Cuckoo 
Spotted Kingfisher 
White Rumped Shama 
Pseudo 
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Figure 1: Box-and-whisker plot of the spectral variability (SV) of bird species. The yellow 
plot can be visually differentiated from the green plot. 


3.3. ANN structure 

The respective stepwise forward substitution methods applied to determine the optimal 
combination of the 28 spectral properties to structure the ANNbird, ANNd og , and ANNf rog 
resulted into: 


ANN bird 

= ANN(28, [14,1], 14) 

(7) 

ANN dog 

= ANN(4, [9,1], 9) 

(8) 

ANNfrog 

= ANN(28, [12,1], 12) 

(9) 


This means that it will take a combination of all the 28 quantified spectral properties to 
differentiate with high accuracy the bird species and frog species, respectively. For dogs, 
however, only a combination of four spectral properties suffices to differentiate breed barks 
with high accuracy. These spectral properties are: (1) Average MFCC, (2) average MoM, 
(3) MoM standard deviation, and (4) average LPC. 
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3.4. Accuracy of ANNs 

Tables [21 [3l and 0] show the confusion matrices for ANN bird , ANN dog , and ANN frog , respec¬ 
tively. 

Table 2: Confusion matrix for the identification of 13 bird species. Blank entries mean zero. 


Bird 


Bird 

species as 

determined by ANN bird 


Total 

Species 

a 

b 

c 

d 

e 

f 

g 

h 

i 

j 

k 

1 

m 

n 

Correct 

a. Bananaquit 

5 














5 

b. Black Crake 


3 







1 





1 

3 

c. Black Hornbill 



2 



1 






1 


1 

2 

d. Eurasian Skylark 




5 











5 

e. E. Goldfinch 





4 




1 






4 

f. Phil. Bulbul 






4 

1 








4 

g. Phil. Bush Warbler 





1 

1 

1 



1 



1 


1 

h. Phil. D. Cuckoo 

1 







4 







4 

i. Water Pipit 









5 






5 

j. R.-t. Hummingbird 







1 


1 

3 





3 

k. R. B. Cuckoo 








2 



3 




3 

1. S. Kingsher 











1 

4 



4 

m. W. R. Shama 


2 

1 










2 


2 

n. Pseudo 














5 

5 

Total Error 

1 

2 

1 

0 

1 

2 

2 

2 

3 

1 

1 

1 

1 

2 

50 


Overall error rate (%) 28.57 


Overall accuracy (%) 71.43 


The ANNbird was able to identify 100% of the Eurasian Skylark, without additionally iden¬ 
tifying other species as Eurasian Skylark (i.e., its error rate for Eurasian Skylark is 0%). 
Bananaquit was also identified 100% but mistaken one of the Philippine Drongo Cuckoo as 
a Bananaquit (i.e., error rate of 1.82%). All of Water Pipits were identified 100% as well, 
but mistakenly identified one black crake, one European Goldfinch, and one Rufous-tailed 
Hummingbird as Water Pipits. Thus, the ANN b i rd has an error rate of 5.45% for identifying 
Water Pipit. The Pseudo species was also correctly identified 100%, but also identifed Black 
Crake and Black Hornbill as Pseudo species (i.e., error rate of 3.64%). Only one of the 
samples was identified by ANN bird as Philippine Bush Warbler (i.e., accuracy of 20%). It 
also identified one Philipine Bulbul and one Rufous-tailed Hummingbird as Philippine Bush 
Warbler (i.e., error rate of 3.64%). The overall accuracy of ANN bird is 71.43% while its 
overall error rate is 28.57%. With regards to identifying bird species, the null hypothesis Ho 
is rejected and the alternate hypothesis Hi is accepted in its stead. 

The ANN dog was able to identify all dog breeds correctly (i.e., 100% accuracy, Table [3]). It 
correctly identihed the Pseudo breed 50% of the time. The other half of the Pseudo breed 
was identihed erroneously as Chow Chow (i.e., error rate of identifying Chow Chow is 5.56%). 
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The error rate for identifying all the other breeds, including the Pseudo breed is 0%. The 
overall accuracy of ANNd og is 94.44% and its overall error rate is 5.56%. With these results, 
the alternate hypothesis "Hi is accepted instead of the null hypothesis T-Lq. 

Table 3: Confusion matrix for the identification of eight dog breeds. Blank entries mean 
zero. 


Dog 

Breed according to ANNd og 

Total 

Breed 

a 

b 

c 

d 

e 

f 

g 

h 

i 

Correct 

a. Beagle 

2 









2 

b. Chihuahua 


2 








2 

c. Chow Chow 



2 







2 

d. Labrador Retriever 




2 






2 

e. Pomeranian 





2 





2 

f. Poodle 






2 




2 

g. Shill Tzu 







2 



2 

h. Siberian Husky 








2 


2 

i. Pseudo 



1 






1 

1 

Total Error 

0 

0 

1 

0 

0 

0 

0 

0 

0 

17 


Overall error rate (%) 5.56 


Overall accuracy (%) 94.44 


All frog species (Table HJ, except for the Pseudo species were identified by ANNf rog correctly 
(i.e., 100% accuracy for each species and 0% accuracy for the Pseudo species). ANNf rog 
also identified all Pseudo species as B. luteus. Thus, ANNf rog has an error rate of 9.09% for 
identifying B. luteus. ANNf rog has an overall accuracy of 90.91% and an overall error rate of 
9.09%. The accuracy of ANN frog proves that the alternate hypothesis must be accepted 
and the null hypothesis rejected. 


4. Summary and Conclusion 

Three ANNs for respectively identifying 12 bird species, eight dog breeds, and 11 frog species 
were structurally optimized, trained, and evaluated. The animals were identified through 
the 28 quantifiable spectral properties of their respective vocalizations. Identifying bird 
and frog species requires all 28 properties while identifying dog breeds only requires four, 
namely average MFCC, average and standard deviation of MoM, and average LPC. The 
respective accuracies of identifying bird species, dog breeds, and frog species are 71.43%, 
94.44%, and 90.91%. The respective alternate hypotheses that there exists combinations 
of the spectral properties of the vocalizations of these three animals that can differentiate 
among their respective breeds or species with at least 70% accuracy are accepted. Thus, by 
using bioacoustics methods, the sound produced by animals during their vocalizations can 
be used as identifiers of species, and that the process can be automated by ANNs. 
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Table 4: Confusion matrix for the identification of 11 frog breeds. Blank entries mean zero. 


Frog 

Frog species 

as determined by ANNf rog . 

Total 

Species 

a 

b 

c 

d 

e 

f 

g 

h 

i 

j 

k 

1 

Correct 

a. B. luteus 

2 












2 

b. B. marinus 


2 











2 

c. F. limnocharis 



2 










2 

d. H. glandulosa 




2 









2 

e. K. baleata 





2 








2 

f. K. pulchra 






2 







2 

g. M. butleri 







2 






2 

h. O. hossi 








2 





2 

i. P. aspera 









2 




2 

j. P. leucomystax 










2 



2 

k. R. catesbeina 











2 


2 

1. Pseudo 

2 











0 

0 

Total Error 

2 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

20 


Overall error rate (%) 9.09 


Overall accuracy (%) 90.91 


One of the implications of this result is that species inventory procedures for biodiversity 
recording purposes maybe augmented and enhanced through a technique called “crowd¬ 
sourcing” (jEstelles-Arolas and Oonzalez-Ladron-de-Guevaral. 2012). This technique is sim¬ 
ilar to the Christmas Day Bird Census that was started in 1900 and has_become an annual tra¬ 
dition by the members of the National Audubon Society (Robbins, 2015). In crowdsourcing, 
however, the participants are not only bird enthusiasts but common people equipped with a 
modern-day smartphones whose applications can automatically record animal vocalizations, 
identify the animals, and send the dated and geo-located information to a central database. 
This automated process maybe transparently done by th e ap plication without user interven¬ 
tion, which is a much better system than what Huetz and Aubin C 2012h proposed. 
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